Mining simple and complex patterns efficiently using Binary Decision Diagrams

نویسنده

  • Elsa Loekito
چکیده

Pattern mining is a knowledge discovery task which is useful for finding interesting data characteristics. Existing mining techniques sometimes suffer from limited performance in challenging situations, such as when finding patterns in high-dimensional datasets. Binary Decision Diagrams and their variants are a compact and efficient graph data structure for representing and manipulating boolean functions and they are potentially attractive for solving many problems in pattern mining. This thesis explores techniques for the use of binary decision diagrams for mining both simple and complex types of patterns. Firstly, we investigate the use of Binary Decision Diagrams for mining the fundamental types of patterns. These include frequent patterns, also known as frequent itemsets. We introduce a structure called the Weighted Zero-suppressed Binary Decision Diagram and evaluate its use on high dimensional data. This type of Decision Diagram is extremely useful for re-using intermediate patterns during computation. Secondly, we study the problem of mining patterns in sequential databases. Here, we introduce a new structure called the Sequence Binary Decision Diagram, which can be used for mining frequent subsequences. We show that our technique is competitive with the state of the art and identify situations where it is superior. Thirdly, we show how Weighted Zero-suppressed Binary Decision Diagrams can be used for discovering new and complex types of patterns. We introduce new types of highly expressive patterns for capturing contrasts, which express disjunctions of attribute values. Moreover, to investigate the usefulness of disjunctive patterns for knowledge discovery, we employ a statistical methodology for testing their significance, and study their use for solving classification problems. Our findings show that classifiers based on significant disjunctive patterns can be more robust than those which are only based on simple patterns.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Algorithm for Network Reliability Evaluation

Binary Decision Diagram (BDD) is a data structure proved to be compact in representation and efficient in manipulation of Boolean formulas. Using Binary decision diagram in network reliability analysis has already been investigated by some researchers. In this paper we show how an exact algorithm for network reliability can be improved and implemented efficiently by using CUDD - Colorado Univer...

متن کامل

Symmetric Item Set Mining Using Zero-suppressed BDDs

(Abstract) In this paper, we propose a method for discovering hidden information from large-scale item set data based on the symmetry of items. Symmetry is a fundamental concept in the theory of Boolean functions, and there have been developed fast symmetry checking methods based on BDDs (Binary Decision Diagrams). Here we discuss the property of symmetric items in data mining problems, and des...

متن کامل

Suffix-DDs: Substring Indices Based on Sequence BDDs for Constrained Sequence Mining

In this paper, we study an efficient index structure, called Suffix Decision Diagrams (SuffixDDs), for knowledge discovery in large sequence data. Recently, Loekito, Bailey, and Pei (KAIS, 2009) proposed a new data structure for sequence data, called Sequence Binary Decision Diagram (SeqBDD), which is an extension of Zero-suppressed Binary Decision Diagrams (ZDDs) for sequences. SuffixDD is a c...

متن کامل

Knowledge Compilation for Itemset Mining

Mining frequently occurring patterns or itemsets is a fundamental task in datamining. Many ad-hoc itemset mining algorithms have been proposed for enumerating frequent, maximal and closed itemsets. The datamining community has been particularly interested in finding itemsets that satisfy additional constraints, which is a challenging task for existing techniques. In this paper we present a nove...

متن کامل

Are Zero-suppressed Binary Decision Diagrams Good for Mining Frequent Patterns in High Dimensional Datasets?

Mining frequent patterns such as frequent itemsets is a core operation in many important data mining tasks, such as in association rule mining. Mining frequent itemsets in high-dimensional datasets is challenging, since the search space is exponential in the number of dimensions and the volume of patterns can be huge. Many of the state-of-the-art techniques rely upon the use of prefix trees (e....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009